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Abstract The paper considers a linear regression model with multiple change-points occur- 
•^T , ring at unknown times. The LASSO technique is very interesting since it allows the parametric 

estimation, including the change-points, and automatic variable selection simultaneously. The 
asymptotic properties of the LASSO-type (which has as particular case the LASSO estimator) 
and of the adaptive LASSO estimators are studied. For this last estimator the oracle properties 
are proved. In both cases, a model selection criterion is proposed. Numerical examples are pro- 
vided showing the performances of the adaptive LASSO estimator compared to the LS estimator. 
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1 Introduction 



(N 

yZ^ A change-point model is a model which changes at the unknown observations. Change-point 

\^0 , detection procedures fall into two categories: retrospective or a posteriori change detection and 

on-line, sequential or a priori change detection. This paper focuses on an a posteriori change- 
point problem, which arises when the data are completely known at the end of the experiment 
[~^. ■ to process. More precisely, we study the high-sized a posteriori change-point model: study a 

phenomenon (dependent variable), function of one very large regressors variables number, with 
unknown change-points number. 

A significant adv ancement in variable selection in a model without change-point was realized by 
iTibshiranil (J1996TI . proposing the LASSO method. Then, the estimation and model selection are 
simultaneously treated as a single minimization problem. If the model have change-points, the 
LASSO method would allow at the same time to estimate the parameters on every segment and 
eliminate the irrelevant predictive covariates without crossing every time by a hypothesis test. 
We remark that the least squares (LS) method gives nonzero estimates to all coefficients. 
In this paper we propose a method for a change-points linear model with the aim of estimating 
and choosing the covariates (regressors) simultaneously. The obtained results will be very useful 
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for high-sized models which is often used in various fields especially in medicine, meteorology or 
financial econometrics. 

Concerning the LASSO method in a model without change-po ints, a generalization o f LASSO 
{L\ penalty) and ridge (L2 penalty) estimators was given in Knight and Ful ( 20001 ) by min- 
imizing the residual sum of squares plus a penalty proportion to the models parameters. The 
obtained estimator is called LASSO-type estimator. The ridge method has a good prediction per- 
formance through a bias-variance trade-off, while the LASSO method encourages both shrinkage 
and automatic variable selection simultaneously and it has very good computational properties. 
Nevertheless the LASSO estimator does not satisfy the ora cle propert ies. To remedy this in- 
convenience, an adaptive LASSO estimator was proposed bv lZoul ([20061 ). Recall that the oracle 
properties are: the zero components of the true parameters are estimated (shrunk) as with 
probability tending to 1 (also called sparsity property) and the nonzero components have an 
optimal estimation rate (and is asymptotically normal). 

Let us give some references of recent papers on the LASSO method, with the remark that to the 
author's knowledge, the LASSO problem has not yet been addressed i n a change-point m odel. 
For the models without change-points, in the case of linear regression, iFan and Lil (|2001[ ) show 
that t he LASSO method produces biased estimator for the large para meter regression and IZou 
2006) proves that the oracle properties do not hold for the LASSO. iPotscher and Schneider 
200911 study the distri bution of the adapti ve LASSO estimator in finite samples and in the 
large-sample limit while 
gression. In t he pap er of 



Xu and Yind (120101) consider the LASSO-type penalty in a median re- 



Foster et al.l ( 2009T) a LASSO random effects model is considered and in 



Bickel et al.l ([20091 ) . the equivalence results and sparsity oracle inequ alities for t he LA SSO and 
Dantzing estimators in a nonparametric regression model are given. I Wei et al.l ([201 11 ) consider 
the problem of variable selection and estimation for a linear model with time- varying effects 
for covariates using the group LASSO and adaptive group LASSO methods. They proved that 
the obtained estimators are consistent and the adaptive group LASSO estimator has the oracle 
selection property. 

Seen the very interesting properties of these estimators (LASSO-type and adaptive LASSO), we 
study its behavior in a model with change-points. In order to estimate the change-point number, 
we also propose a model selection criterion based on LASSO-type or adaptive LASSO method. 
In a multiple change-point model, the break estimation could affect the estimator properties: 
variable selection on each segment, oracle properties, ... This is the main interest of this paper. 
Besides, sinc e the penal t y contains the model pa r ameters, the ch a nge-p oin t results of the li tera- 
ture (see e.g. H3 l| 19981 ). iBai and Perron] (|l998l ). lKoul and Qianl (J2002l) or ICiupercal (|2009l )) do 
not apply. 

This paper considers the estimation and the selection of the significant variables in a linear 
regression with multiple change-points occurring at unknown time. The study of this method 
was motivated by wishes to find the properties of the estimator, particularly interesting in a 
change-point model with high-sized regressors, which allows the automatic elimination of the 
non significant variables on every phase, without using hypothesis test. 

Two estimation methods are proposed and studied: LASSO-type (with particular cases: ridge and 
LASSO method) and adaptive LASSO. The first method has the advantage that it can consider 
a lower observations size with respect to model parameter number. But, under certain conditions 
on design (matrix of observed regressors), the LASSO-type regression parameters estimators are 
not consistent. Then, we can use the adaptive LASSO method which correctly selects variables of 
nonzero coefficients with probability converging to one. On the other hand, in order to calculate 
an adaptive penalization, this method can be used only if the observations size on every segment 
is bigger than the parameters number in corresponding interval. 
The difficulty to study a change-point model, when the number of change-points is fixed, results 
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first from the dependence of the model of two parameter types: the regression parameters and 
the change-points. Moreover, for multiple breaks, each middle regime has completely unknown 
boundaries. Then, for the two estimators we study the asymptotic behavior: convergence rate and 
asymptotic distribution. For the adaptive LASSO regression parameter estimator, we also prove 
that the change-point presence in the model does not influence the oracle property: between two 
consecutive breaks, nonzero parameters estimator is asymptotically normal and zero parameters 
are shrunk directly to with a probability converging to 1. 

If the number of breaks is unknown, the problem of its estimate arises. We propose, a general 
criterion to estimate the number of change-points. 

Finally, numerical simulations are realized to illustrate the theoretical results and to show the 
advantages of the proposed methods in terms of detection of irrelevant variables in a change- 
points model and also of break number estimation. 

In the present paper, as original contribution we provide statistical asymptotic properties of the 
LASSO-type and adaptive LASSO estimator in a change-point model. The structure of this paper 
is as follows. The model and assumptions are introduced in Section 2. In Section 3, a LASSO- 
type estimator in a change-point model is proposed and its asymptotic behavior is studied. 
Convergence rate, asymptotic distribution of the regression parameters and of the change-point 
estimators are obtained. A model selection criterion is also studied. Next, adaptive LASSO esti- 
mator and its oracle properties are given in Section 4. Section 5 reports some simulation results 
which illustrate the theoretical results. Finally, Appendix contains the proofs of the results. 



2 Model 

We consider the model: Y{ = /g(Xj) + £;, for the step- function with K (K > 0) change-points: 

/e(Xj) = ^(Xj)!^^ + h ( /, 2 (X i )li 1 <i < i 2 H h h t j >K+1 (X i )l i> i K , i = 1,- • • ,n 

more precisely h$(X.) = X'</>, <fi E T C R p , r compact. 

Up to our knowledge, no result exists in the literature on the LASSO estimation in a model 
with change-points. In all previous works, where the estimation by penalizatio n is considered in a 
multi-phase model, the penalization does not contain model parameters ( see ICiupercal (|2011bl) ) 



and thus we cannot make model selection and the estimation simultaneously. Classically, in 
the rich literature on the change-point estimation, many articles contribute to determine the 
break number, t o estimate the chang e -poin t lo cation and the r egression parameters (see e.g. 



break number, t o estimate the chang e -poin t lo cation and the r egression parameters (see e.g 
Bai and Perron! ( 19981 ). Kim and Kind ( 20081 ) or ICiupercal (2009J)). Once the asymptotic distri 



bution of estimator is proved, the hypothesis test may be performed to eliminate the irrelevant 
predictive covariates. This approach requires a huge amount of computation if p or K are large. 
A solution to carry out change-point analysis, p erform vari able selection and estimate regres- 
sion parameters simultaneously was proposed by IWvJ (J2008TI . Since it i s ne c essary to calculate 
all sub-models, of order 2 P+4 when K = 1, the criteria considered bv IWul (J2008TI still require 
many calculations. On the other hand, if p is large compared to n, the Wu's criteria need to be 
modified. 
We can also recall the paper lHarchaoui and Levv-Ledud ( 20101 ) where the estimation of the lo- 



cation of change-points in one-dimensional piecewise constant in an white noise is reformulated 
as a variable selection problem with a L\ penalty. Note that their model is constant between two 
consecutive change-points and reformulated model is a classical linear regression without breaks. 
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For the sample i, Yi denotes the response variable, Xj is a p- vector of regressors and the £j is 
the error. The errors (£j)i<j< n are independent identically distributed(i.i.d.) random variables. 
The model parameters are = (<f> lt • • • , 4> K+1 ,h, ■ ■ ■ , Ik) £ r K+1 x N x and their true values 
(unknown) are 0° = (0?, • • • , 4> K+ll ??,••-, 1^). We set: = (0 X , 2 ), with 6>i = (0 1; • • • , (j> K+1 ) 
the regression parameters, 02 = (h, ■ ■ ■ , Ik) the change-points. 
Denote by 4> r _k the fcth component of <f> r and 0° fc the fcth component of cf> r , for r 6 {1, • • • , AT+I}, 

k = l,-" ,P- 

Let us consider the deterministic design matrix X = (Xij) i<;<„ , X J its column j and Xj the 

i<i<p 
zth line. 

We now state the assumptions under which the asymptotic properties of estimators will be de- 
rived. 

First, we impose the condition that the change-points are sufficiently far apart: 
(HI) there exists two positive constants u,c(> 0) such that l r+ i — l r > Co[«"], for every 
r = 1, • • • , K, with Iq = 1 and Ik+i = n. 
Without loss of generality, in the following we consider 3/4 < u < 1 and cq = 1. 

For the design X, we suppose that: 
(H2) "nT 1 maxi<j<n XJX^ — > and for any r = 1, • • • , K + 1, the matrix 

C„ r = (l r — Ir-i)^ 1 y^iL; +1 X^X£ — >■ C r , with C r a non-negative definite matrix. 

For the errors £i we suppose that: 
(H3) e is a random variable absolutely continuous. Moreover E\ej\ = and E[ef\ = a 2 . 



The assumption (HI) is standard for a change-point model, see iBail (J1998I ) or ICiuperca 



(I2011al). while (H2) is i mposed when LASSO methods are used, see for example IZoul (|2006l) 



or Knight and Ful ( 20001 ). Assumption (H3) is classic in a regression model 



The matrix C„ jr are non-singular for all n and r, while the matrix C r can be singular. Let 
us denote by C° the limiting matrix for the true-change-points I®, r = 1, ••■ , K + 1. Without 
loss of the generality, we assume that the regressors are centered. 

To complete the model, we shall make the usual identifiability assumption that adjacent re- 
gressions are different: 4> r 7^ 0r+n r = 1, • • ■ , K , 

All throughout the paper, c denotes a positives generic constant. For a vector v = (vi, • • ■ ,v p ) 
let us denote |v| = (|fi|, • • • , |i? p |) and |v| c = (|wi| c , • • • , \v p \ c ). On the other hand, ||v|| is its Eu- 
clidean norm. All vectors are column and v' denotes the transpose of v. For a real x, [x] means 
the largest integer not larger than x. 

After from these general notations, in every section we shall give the notations used for each 
method. 



3 LASSO-type estimators 

In this section we define and study the LASSO-type estimators in a change-point model. 
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3.1 Notations 

Under assumption (HI), between two consecutive change-points l r -\ and l r we consider a positive 
sequence X n ji r ._ lj i r .) — > oo for n — > oo to control the amoun t of regularizat i on ap plied to the 
estimators. Based on the LASSO-type method introduced bv lKnight and Ful ( 20001) in a model 
without breaks, let us consider the penalized sum: 



K+l 



T n (K,e u e 2 )^Y,T, 



i=l r—1 



f/rr* L rp "T 



J=l 



1^ <i<i r , 



For the tuning parameter A n n r _ 1 i r \ = 0(l r — l r -i) x ' 2 and 7 > 0, denote then the minimum of 
the penalized sum of the residuals squared for each fixed breaks h, ■ ■ • , Ir+i- 

S{h,--- ) l K ) = iniT n (K,e l ,e 2 ), (1) 

01 

with Zo = 1 and Ik+i — n. For K fixed, we define the LASSO-type (or Bridge) estimator of 
(61,62) as a point 

(0i„^L)=argminT n (^,0 1 ,0 2 ). 

(01,02) 

More exactly we denote: 6 ln = \<j>ini'" i4>K+\n) the regression parameter estimator and 
62,1 = [If, ■ ■ ■ , Ijc) the LASSO-type estimator for the change-points. Obviously 

6 2n = a,i-gmm {h ... lK)em S(h, ■ ■ ■ ,l K ). 

We remark two particular cases: for 7 = 2 we obtain the ridge estimator and for 7 = 1 the 
LASSO estimator. 

The construction of the estimators has two stages: first we search the regression parameters 
estimators and then we localize the change-points. Then, first, between every break Z r _i and Z r , 
we calculate the LASSO-type estimator of <f> r by: 

lr P 

%lr- lM = argmin[ £ (Y t - X^) 2 + K^-iM £ I^H' 

^ i=Z f ,_ 1 + l fc=l 

0f ; ; •) fc its fcth component and the corresponding forecast YA 1 ) -j = ~K-j4>ti T _ 1 1 )■ The re- 
gression parameters estimators are: # lrl (# 2 ) = (^a,,M> Q(h.i 2 )' ' ' ' > ®(Ik-Ik+i))- After we calculate 
the change-points estimators: 

02„-argminT n (^,0i„(0 2 ),0 2 ) 

e 2 GN x 

Also 0*„ = s lri (0 2 „)- 

Note that in order to take into account the sample size in every segment (phase), the tuning 
parameter A„ n r _ 1 i r ) varies from a segment to the other one with the interval length l r — l r -\. 
For the true values of parameters (<p 1 , • • • , 4>k+i> ^i> ' * ' 1 1>k)> we consider the equivalent sum of 
©: 5 = J27=i £ 1+J2f=i Kiil^io.) Ef=i l^rjl 7 : where 0° j denotes the jth component of <f>°. 
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For every sample, we consider the difference of the squared errors by taking some parameter 
and true parameter in the model. So, for two parameters <fi an d <t> , we define the function: 

THifr <f>°) = (e< - Xi(<f> - </> )) 2 -el i = 1, ■ ■ ■ , n. 

When a LASSO-type method is considered, for the model between two consecutive change-points 
jx < j2, a penalization term is added: 






,fe=i fe=i 



ki 1 



for i = ji + !,•■• ,j 2 , 



with ^fc and ^>° fc the fcth component of <p, respectively of (jr. We denote T]f(4>; </> ) = T)*., n \{4>\ <j> ) 
and A n = A„.(o,n)- 

Once the LASSO-type estimators and notations being introduced, we can study the asymp- 
totic behavior of the estimators. First, supposing that the change-points number is known, wc 
consider the corresponding convergence rate, which will allow us to derive the asymptotic distri- 
bution of the estimators. We propose a consistently estimator for the case when the number K 
is unknown. 



3.2 Asymptotic behavior 

Following result implies that the penalized sum S of |T]), optimized with respect to the regression 
parameters, is of order Op(n a ), with a > 1/2 arbitrary. 

Lemma 1 Under assumptions (H2), (H3), if A n; ( Jlj2 ) = Ofo 1 ' 2 ), for two points Ji,j2 G {1,- • • ,n}, 
7 > 0, <j> the true value of the parameter, for all a > 1/2 we have: 



sup 

0<ji<J2<n 



i=7'i+l 



= Op (max(A„. (ilii2) ,n Q )) = F {n a ). 



Convergence rate of the LASSO-type estimator is of order n -1 / 2 (see Knight and Fu, 2000). We 
study then, by the following Lemma, the penalized sum for a LASSO-type method in a model 
without change-points, when the regression parameters are not in a n~ ^-neighborhood of the 
true value parameter. 

Lemma 2 Under assumptions (H2), (H3), if X n — o(n), then there exists e > such that: 

n 

liminf inf n~ > rii((b,(b ) > e. 

This result is useful to prove the following two Lemmas. These indicate that when the data are 
from two different models, the LASSO-type estimator in a model with a change-point is close to 
the parameter of the model where most of the data came. 
For u £ [|, 1) in assumption (HI), v £ (0, j), let us consider a constant S such that: 

5£(0,u-3v). (2) 

For the following lemma, the size sample of model is n\ + ri2 . 
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Lemma 3 Under assumptions (H2), (H3), for all n\ 1 U2 S N such that n\ > n u , with 3/4 < 
u < 1, ri2 < ri" , u < 1/4, /ei 6e the model: 

Y i = X l i 4> Q 1 +e i , i = l,...,m 

y 4 = X-02 + e»i i = ni + 1, • • • ,ni + ri2 

«^> 0? ^ ^. We set: ^ 1+ „ 2 (0) = £^i Vl, {0 , ni) (<i>; 4i) + E^+i^^+n,)^) ««* 
4> ni+ri2 = argmin^ A* 1+n2 (0). Under condition $jj we have: 

(i) H S ni+n2 0?ll < % V2 «^ < n-(«-«- *)/ 2 . 

Similarly, it have that: 

Lemma 4 Consider the model: 

Y i =X. , i cj> l +e i , t = l,..-,fc 

F, = X>£ + £,., z = fc + 1, • • • , k + n 2 

with k € [ani,ni], a € (0,1). Under the same conditions as in the Lemma\3\we have: 



(i) S ^Pa ni <k<n 2 I 

(H) sup oni < fe < n2 






The proofs of all Lemmas are given in Appendix. 

Suppose first that the change-point number K is known. We start by studying the convergence 
rate of the change-point LASSO-type estimator. The proof of Theorem Q] (given in Appendix) 
is split into three steps, first, using Lemma [TJ we prove that the change-point estimators are 
to a smaller distance n 1 ' 2 from the true values. This implies that between two consecutive true 
change-points 1°, l®+i, there are at most two change-point estimators. This allows to prove, using 
also LemmaO the step 2: the change-point estimators are at a smaller distance than n 1 ' 4 of true 
values. Finally, also using Lemma |4l the Theorem [1] is proved. 

Theorem 1 Under assumptions (H1)-(H3) we have If, — l® = Op(l), for every r — 1, ■ ■ • ,K. 

Combining Theorem [T] and the fact that the convergence rate of LASSO-type estimators in 
a model without change-points is n^ 1 ' 2 allow us to have immediately the convergence rate for 
the regression parameter estimator: 

Corollary 1 Under assumptions (Hl)-(HS) we have \\4>^fi ) - 4>° r \\ = (/" - /"_ 1 ) _1 / 2 Ojp(l), 
for allr = l,--- ,K+l. 

These results imply that the LASSO-type penalization does not have influence on convergence 
rate, which is the same as by LS method (see Kim and Kim, 2008): of order (Z° — Ir-i)^ 1 ^ 2 for 
the regression parameters and n _1 (after a change of variable) for the change-point estimators. 
The following theorem, whose proof is given in Appendix, using Theorem[TJ gives the asymptotic 
distribution of the LASSO-type estimators for the change-points. This result implies that the 
asymptotic distribution is the same as for change-points estimators by LS method and it depends 
on the errors (£i), the design (A"j) around every true change-point and on the difference {4> r+ i — 

Remark 1 Since the matrix of assumption (H2) can not be full rank, the limiting distribution of 
the change-point estimators can not be the argmax of a Wiener process with shift, unlike the LS 
estimator (see Bai and Perron, 1998). 
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Theorem 2 Under assumptions (H1)-(H3), for each r = 1, • • • , K , we have that l^ — l® converges 

(r) (r) 

in distribution for n —¥ oo to argmin - eZ Z- , with Zq = and 

- for j = 1,2, ... , Z] r) - Ef=to+i ^° ; *H-i)' 

- /or J = -1, -2, • • • , Z] r) = Y!{=io+ 3 r h(^r+l: $)• 

The asymptotic distribution of LASSO-type estimators for the regression parameters is given 

by the following result. For the particular case 7 = 1, the LASSO esti mator is asympt o tically 

normal for the nonzero true regression coefficients. Note also that, as iKnight and Ful ( 20001) 

~ s 
indicate it, for 7 > 1, the estimator tj>tu £«) for the nonzero coefficients can be asymptotically 

biased. 

Theorem 3 Under assumptions (Hl)-(HS), if the matrix C® = lim„_i. 00 , _)o — ^2i=i +1 X»X£ 



c 

n— ^00 



are not singular then: fc-i' r - 1 ) 1/ *& { t._ 1 ,t. ) -<l> r ) = (^-?-i) 1/a (^ ( rj_ 11 f.)-^)(l+Oj'(l)) 

argmin lieffip V r (u) with V r (u) defined by: 

(1) if j > 1, v r {u) = -2v!w r + u / d 3 r u + x?ELi«**^($U)l$UI 7-1 ' mth ° - A " = 

lim n _>oo ^(f^j.f^/Ur — 'r-l) ■ 

(»»j i/7 = I, V r (u) = -2u'W r + u'Cf r u + \ r Y%=i[uksgn{<t>° rik )K <t ,^ + l»k|l*° ilk =o]> with 
< A° = lim^oo \»;(f^ If f.)/(^ " ^-i) 1/2 - 

(««; j/ 7 < 1 andX ni (f' r _Jf.)/(f'-i'-i)' 1 ' /2 ^ A " > °> V r (u) = -2«' W r +vt C? r u+\ r ELi M 7 V fc =o, 
In £/ie iftree cases, the random p-vector W r is the same: J\f(0, a 2 O r ) . 

We observe that the asymptotic distributions of LASSO-type estimators for the change-points 
does not depend on the tuning parameter Xn n r _ 1 1^ while for the regression parameters this se- 
quence intervenes. This is due to the constraint \n;(l r -i,lr) = 0(n 1 ' 2 ) and to the fact that the 
convergence rate of the change-point estimators is of order n^ 1 . 



3.3 Choice of change-point number 

Suppose now that we don't known a priori the change-point number which is quite often the case 
in practice. Then we introduce a criterion which is going to allow to estimate the number Kq of 
the change-points. Intuitively, the true Kq will be the one which minimize the function S (or its 
log) with respect to K. In order to take into account the model complexity, the criterion is pe- 
nalized by an increasing function in K and model parameter number. The penalization depends 
also on the sample size. 

For a fixed change-points number K, let us denote sk = S(l{ K , ■ ■ ■ ,l s K k)/ti, where the func- 
tion S is defined by ([T]). The criterion consistency is established below with proof in Appendix. 
The demonstration idea is to prove that the change-point estimator is strictly larger or smaller 
than the true value with a probability converging to zero. 

Theorem 4 Let K n be the value of K that minimizes B(K) — n\og§K + G(K,pK)B nt with 
Pk = 5Zr=i Si=i 1<*° #0) function G(K,px) increasing in K, (B n ) a deterministic sequence 

such that B n — > 00, B n n~ 3 ' — > and B n n~ 1 ' 2 — > 00 for n —} 00. Then JP[K n = Kq] — > 1 for 

n — > 00. 
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The function G is a generic factor on a penalty B n . If on every segment (phase) all variables 
are significant, taking G(K,px) = K, we obtain the Schwarz criterion proposed by Yao (1988). 
Concretely in practice, we begins by finding K using the proposed criterion. Afterthat their 
locations and the regression parameters on the segments defined by the change-points are esti- 
mated by minimizing the function T n with respect to 8\. Finally, the function S given by ([1]) is 
minimized with respect to 62- 



4 Adaptive LASSO 

As mentioned in Introduction, in a model without change-points, the LASSO-type estimator has 
the oracle properties for 7 G (0, 1) only, but it is not continuous. On the other hand, the LASSO 
estimator (for 7 = 1) is continuous but does not satisfy the oracle properties. To remedy these 
inconveniences, Zou (2006) proposed an adaptive LASSO estimator. These properties are all 
the most interesting in a change-point model, since they allow to select the significant predictive 
variables on every phrase without realizing the hypothesis tests. We propose the adaptive LASSO 
estimator for a model with change-points. 



4.1 Notations 

In a similar way to the LASSO-type method, between two consecutive change-points l r -\ and 
l r , we define the regression parameters estimators by adaptive LASSO method: 

lr P 

<l>(lr-lM = ar 8' min [ Yl ( V l- X l<t > ) 2 +K;(l r - 1 ,l r )^2u>(l r - 1 ,l r ),k\<P,k\]i ( 3 ) 

^ i=Z r _ 1 +l fc=l 

with </>?£* ; 1 k its fcth component, W(; r _ li ( r ) = \<fin ; J ~ 9 , U'(; r _ ll i r ) ) fc the fcth component of 

W(; r _ li i T .) and (fin 1 \ is LS estimator of <fi calculated between l r -\ and l r . The constant g 
is positive and will be later specified. As for the LASSO-type estimator, the tuning parameter 
\i;(l r _i,J r ) depends of the sample size in every segment. 



loio. u„ = /_ - ■ 

Consider the sum with an adaptive penalization 



Now, the penalized sum for the true parameters is: Sq = Yh=i £ HE r =i \t,(l° ,1°) S/t=i ^(1° ,l ),k\4'r fel- 



K+l 

S* (l u ■■■, l K ) = in£ V 
01 r=l 



p 



X! ( Yi _X i^r) 2 + K,(l T - 1 ,l r )^2w(i T _ u i r)tk \(j> rtk \ 



i=i r _i+l k=l 



This allows us to define the adaptive LASSO estimator for the change-points: (If* , • • • , l s jp) = 
argmin^ ... i k )^k S*(l%, ■ ■ ■ ,1k) and for the regression parameters, using ©: <fi(t am ,f s *)j f° r 
each r = 1, • • • , K + 1. 

We define, between two consecutive change-points j\ < ji, for the true value of the 
parameters, following penalized difference: 

<( J1J2 )W>^ ) = w(0;0°) + Y^E^iA),*^,*! - 141] 

- /2 - /1 fe=l 
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with <p t k and 4>\ the kth component of <p, respectively of <p ■ 

In order to study the oracle properties, for each two consecutive true change-points /°-i7 'n 
consider the set 

^jo^jo) s{fc€{l, ■••,*};<* 9*0} 

with the index of nonzero components of the true regression parameters. Since in practical 
problems, we don't known the true value of the change-points, but their estimators, we consider 
the similar set of index, corresponding to the adaptive LASSO estimators /J!liJr* of the change- 
points: 

will be abbreviated A* for convenience. For simplicity we denote by 4> A , the sub-vector of (p 
containing the corresponding components of A*. We also denote by C° fe7 - the (k,j)th component 
of matrix C„. 



4.2 Asymptotic behavior 

By the Lemmas [5] and [6] we prove that the adaptive penalization is not a bigger order than the 
minimized squares sum. 

Lemma 5 Let the model Y = X0 + £, with Ysnxl vector ofYi and Xunxp matrix. If <p is 
the true value of the parameter <fi, under assumptions (H2), (H3), for g e (0, j), < j\ < ji < n, 

\i,(jij,) = oin 1 / 2 ), we have: K,(Juh)™(ji,J2) = Opin^ 11 ). 

Following lemmas are needed to the proofs for the adaptive LASSO results. In fact, they study 
the adaptive LASSO estimator in a model without change-point and they are the equivalent of 
the Lemmas [TJ21 

Lemma 6 For two points ji,J2 G {0, 1, • • • ,n}, <f> the true value of the parameters, under 
assumptions (H2), (H3), if ^ n ,(ji,j 2 ) = °{ nl ), then, for all g 6 (0, j) we have: 



sup 

0<ji<J2<n 



i=h+l 



F (n 2 ). 



Lemma 7 Under assumptions (H2), (H3), if \ n = o(n 1 ' 2 ), then there exists e > such that: 

n 

Hn l inf „ r^ f , /, n ^ E C(0.n) (& <t>°) >e - 

n-Hx ||</)-0°||>n- 1 /2 ^ 'V > > 

Lemma 8 For a// ni,n2 € N smc/i i/iai ni > n u , with 3/4 < u < 1, rii < n v , v < 1/4, Zet us 
consider the model: 

Yi = X^0" +£j! * = 1, — ,«i 

yj = X<0 2 + Si, i = m + 1, • • • ,ni + U2 

with the assumption 0? ^ 0°. Owwwfer <* 1+n2 (0) s £?=i r^ o ,„ o (0, 1 )+El=^+i <(„ Ijni+ „ 2 )(0; 4>°) 

and 4> ni +n 2 — ar S mm </> ^™*i+"2(^ > )- Under the condition {J^ and assumptions (H2), (H3), we 
have: 



(n)EZiv:* { o, ni) (K* 1+ n 2 ;<t> ( l) = o P (i). 
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The equivalent of the Lemma 0] is also valid for the adaptive LASSO estimators. 

As in Section 3, let us first suppose that the change-point number K is fixed. By the same 
arguments used in the proof of Theorem[TJ using now Lemmas [5] and |51 we have following theorem 
which gives the convergence rate of the adaptive LASSO estimators for the change-points. The 
proof is omitted. 

Theorem 5 Under assumptions (Hl)-(HS), for all g £ (0,4), we have If.* — l® = Op(\), for 
each r = 1, • • • , K . 

In order to work in a bounded interval, we can consider r° = l^/n and f** = If* jn its es- 
timator. Then, Theorem [5] implies that f£* converges in probability to t® with the convergence 
rate n _1 . It is the same convergence rate as by the LS method (see Bai and Perron, 1998). 

We need that g £ (0, 4) since the equivalent of the relation (TTTT) . for samples between two true 
change-points, is ~Op(n^~) which must be <C Op([n p ]). This will imply that every change- 
point estimator is to a distance strictly smaller than n p with respect to the true value. The 
constant 1/4 results from the supposition that the constant u of assumption (HI) is larger than 
3/4. In fact, the satisfied condition by positive constant g is: g + u < 1. In a model without 
change-points the constant g can take any positive value, while in a change-point model it de- 
pends on the distance between the change-points. 

Theorem O combined with the oracle properties of adaptive LASSO estimator in a model 
without change-points (see Zou, 2006) yield that the convergence rate of cpria, fc,\ to the <p r is 

of order (l® — ZJL-l) -1 / 2 , the same for the LASSO-type estimator and by LS method (see Bai and 
Perron, 1998). 

Remark 2 By similar arguments used for the proof of Theorem [5] we can prove that the asymp- 
totic distribution of differences If* — l® is the same as for the LASSO-type estimators. 

The presence of change-points in the model makes that the important oracle properties of 
the adaptive LASSO estimator for the regression parameters are not obvious. 

The following result proves that on every segment, the adaptive LASSO estimator for the 
regression parameters has the oracle properties: nonzero parameters estimator on each estimated 
segment is asymptotically normal and zero parameters are shrunk directly to with a probability 
converging to 1. 

Theorem 6 Under assumptions (H1)-(H3), g £ (0, \), if X„no jo^l® — ^-i) -1 / 2 — > and 
Ki,(i°_ ,i°)(lr - lr-i)^ 9 ~ 1 ^ 2 ^ oo /or n ^ co, then: 

fi)(ir~ir-i) 1/2 (4>ll j* r >)-€U = (i°r-i°r- 1 ) 1/2 (^^ ) -^M(i+o P (i)) A ATio^n")- 1 ), 

where for q r = Card{A*, , J, M r = i^r ki)k.-ieA* „ is a q r x q r matrix. 

(it) limn^oo lr[A n £ j, m ^ = •^■n,(l°_ 1 ,l°) = (l°_ lt l°)> = !' 

Theorem [5] is proved in Appendix. The demonstration is based on the Lemma the Karush- 
Kuhn- Tucker (KKT) conditions and the oracle properties in a model without change-points. 
Note that, for nonzero coefficients their estimators are asymptotically unbiased. 
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Let us make vary the change-point number K . Choosing 

(fen ' ' ■ . 1"k,k) = argmin S*(h, ■■■ ,l K ) 

(Ii,-,Ik) 

and s* K = S*(l\* K , • • • , l^ K )/n, similarly to the Theorem 2] and its proof, we can define a con- 
sistent criterion for the change-point number: K* — arg nun^ (n log s|f + G(K,pk)B„), with 
function G and sequence B n as in Theorem 21 

Remark 3 Since, on every segment, the adaptive LASSO estimator of the regression parameter 
has the oracle properties, we can consider instead of px its estimator 

K+i v 

r=l j — 1 x 

which was not possible for the LASSO estimator (7 = 1) studied in Section 3. Then the adaptive 
LASSO criterion is more interesting numerically. 

Remark 4 If p is large compared to n, or more precisely for (K + l)p > n, there exists at least 
a segment where the LS estimator cannot be calculated. Then the adaptive LASSO estimator 
cannot be calculated also, and in this case, the LASSO-type method must be used. 



5 Simulations 

To illustrate the theoretical results and to compare the performances of the adaptive LASSO 
method with classical LS method in a change-point model we perform a simulation study. By 
these simulations, we show the advantages of the proposed method in terms of detection of ir- 
relevant predictive variables. The obtained results proves that the proposed method will be very 
useful for an high-sized change-point model. 

All simulations were performed using the R language. To calculate the adaptive LASSO estima- 
tions, the function Iqa of the package Iqa was used. 

First, the number of phases is assumed to be known. We consider 10 latent variables X\, ■ ■ ■ , X\$ 
with A 3 ~ JV(2,1), A 4 ~ JV(4,1), X 5 ~ JV(1,1) and Xj ~ JV(0, 1) for j € {1,2,6,7,8,9,10}. 
The models contain two change-points (three phases) and the errors are Gaussian standardized. 
The true values of the regression parameters (coefficients) on the three segments are respectively: 
(1, 0, 4, 0, -3, 5, 6, 0,-1, 0), (0, 3, -4, -3, 0, 1, 2, -3, 0, 10), (1, 3, 4, 0, 0, 1, 0, 0, 0, 1). The tuning pa- 
rameter X n .n r _ 1 i r \ is (l r — l r -\) p and the two change-points can vary in the interval [l,n]. For 
adaptive LASSO method, various values for the parameters g and p are considered. Recall that 
g is the power of the adaptive penalization W(; rl ; r ) = \4>u r _ 1 i r )\~ 9 m relation ([3]) and p is the 
power for the tuning parameter A n; (; r _ 1; i r ) = (l r — l r -i) p on each interval [l r -i,l r ] The sample 
size n varies from 35 to 400. The classical LS method is also considered. For each model, we gen- 
erated 500 Monte-Carlo random samples of size n. The percentage of zero coefficients incorrectly 
estimated to zero(true 0) and the percentage of nonzero coefficients estimated to zero(false 0) 
are computed (see Tables [T][S]) . Since the asymptotic distribution of the change-points estima- 
tors can not be symmetric, in each table we also give the median of the change-point estimations. 

We obtain that, if there are segments with a small sample size, the detection percentage of 
the true zeros is relatively low and that of detection of the false zeros is high. The same results 
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Table 1 Median of change-points estimations, percentage of true and of false by adaptive LASSO and LS 
methods for n = 50, K = 2, l® = 20, 1% = 35. 



(9,P) = 


f 1 13 ) 

( 7 ' ox) 


VR> 94 ) 


V k > on J 


( y 2 ) 

V 4(1 ' F, ) 


LS 


median of (Z|*,£|*) 


(20,35) 


(20,35) 


(20,35) 


(20,35) 


(20,35) 


% of trues 


77 


78 


77 


77 





% of false 


22 


22 


20 


18 






Table 2 Median of change-points estimations, percentage of true and of false by adaptive LASSO and LS 
methods for n = 100, K = 2, ij = 20, 1% = 85. 



(9,P) = 


f l 13 n 


,L IK 


*. F, ' 01) > 


( y 2 ) 
V 4n ' k - 1 


r 7 ) 

V f, ' -;n ) 


LS 


median of (Zf*,£|*) 


(20,85) 


(20,85) 


(20,85) 


(20,85) 


(20,85) 


(20,85) 


% of trues 


86 


87 


88 


88 


89 





% of false 


18 


18 


17 


14 


12 






Table 3 Median of change-points estimations, percentage of true and of false by adaptive LASSO and LS 
methods for n = 400, K = 2, l\ = 20, 1% = 385. 



(9,p) = 


Vfi' 94 - 1 


V F, ' 9(1 > 


( y 2 ) 

V 40 ' F, > 


LS 


median of (2|* , Z|*) 


(20,385) 


(20,385) 


(20,385) 


(20,385) 


% of trues 


90 


90 


90 





% of false 


16 


16 


12 






Table 4 Median of change-points estimations, percentage of true and of false by adaptive LASSO and LS 
methods for n = 500, K = 2, l\ = 200, l\ = 400. 



(9,P) = 


V (i ' 94 ) 


V K ' 9(1 ) 


V 4n ' x ) 


LS 


median of (Z|*,£|*) 


(200,400) 


(200,400) 


(200,400) 


(200,400) 


% of trues 


99.9 


99.9 


100 





% of false 


9.5 


8 


4 






were obtained in the simulations of Zou (2006) for a model without change-points (for a model 
without change-point with a sample size equal to 60, four covariates, the largest percentage ob- 
tained by Zou to detect the zeros was 73%). We observe that the detection rate of the true 
varies more with the sample size on every segment than with the parameters g or p. This rate 
increases slightly with g and it docs not depend on the location of change-points: equidistant 
or not. Recall that the performances of criteria proposed by Wu (2008) have varied with the 
change-point location for fixed n. In all cases, even for a small number of observations, the me- 
dian of the obtained estimations coincides with the true values of the change-points. 
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Table 5 Median of change-points estimations, percentage of true and of false by adaptive LASSO and LS 
methods for n = 1500, K = 1,l\ = 200, l\ = 400. 



(9,P) = 


,1 11, 


( 1 9 , 
^ K > on 1 


( 9 2 ) 

< 4(1 ' K 1 


LS 


median of (Z| *, £|*) 


(200,400) 


(200,400) 


(200,400) 


(200,400) 


% of trues 


100 


100 


100 





% of false 


4.9 


3.5 


2.3 






Hence, when the sample size increases, the adaptive LASSO method tends to select the true 
model. The penalization absence means that the LS method do not exhibit this good property 
in the sense that all estimations are non-zero and in order to identify the parameters some 
supplementary hypothesis test on every phase are necessary. 



In order to illustrate the model selection criterion, we now simulate a linear model with 
one change-point: Yi — Xj0 1 l 1<i< /o + Xj0 2 l;o< i<n + Si, i = 1, ■ ■ • ,n, with n — 100, cp 1 = 
(1,0,4,0,-3,5,6,0,-1,0), <f>° 2 = (1,3,4,0,0,1,0,0,0,1) and /° = 35. The criterion B(K) for 
adaptive method is computed for B n = n 5 ^ 8 , G{K,px) — K and the parameters for the adap- 
tive LASSO are g = 1/5, p — 9/20. For this model 200 Monte Carlo samples of size n are 
generated for regressor X and error e. We obtain that a,igmm Ke t Q 12 ^x B(K) — 1 for each 
Monte Carlo model replication. 



Conclusion By these simulations we showed advantages of the proposed (adaptive LASSO) 
method in terms of detection of irrelevant variables in a model with change-points and also of 
break number estimation. For a large enough sample size the adaptive LASSO method selects 
the true model, independently of change-point location. On the other hand, the constant g affects 
slightly the detection of the true parameters. The change-points are correctly estimated. 



6 Appendix 

Here we present the proofs of the results stated in Section 3 and 4. We first give the proofs of 
Lemmas which are useful to prove the main results. 



6.1 Proofs of Lemmas 



The first four lemmas concern the LASSO-typc estimator. 
Proof of Lemma [T]Wc first show that: 



sup 

0<ji<J2<n 



inf 

4> 



32 

E 

=71+1 



77,(0; 4> a )\=Ojp(n a ). 



(4) 



Since E[e t ] = 0, E[rn(4>; cf> )} > and 77,(0°; 0°) = 0, it holds that: > inf^ J2i=i mifa <t>°) > 

tof*Et=jfa(0; 0°)-JB[r? i (0; 0°)]]- Thus | inf* Yti Vi(4>\ 4>°)\ < sup* | YH=M<t>\ 0°)-^[t7 4 (0; 0°)]] 
and relation (Q| follows as in Lemma 3 of Bai (1998), using (H3). For 77? we have: 
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K, (jl , h) (<P^°)-Vi(^^)\ 



— ja—h 



ELii^r 






*°*i 7 



which implies, taking into 



account ©, that: sup < J - i<J - 2 < n | iiu> ££ il+1 Vi ; ^ uh )(4>; ^°)l - su Po<ji<j 2 <n(l inf </> E<Lj 1+ i ^(0! 0°)!" 
cA n;(jlj2 )) = O^K)+0(n 1 /2). ■ 

Proof of Lemma [2] Since X n = o(n), then for all e > 0, <j> € -T, there exists n e e N such that: 



n X A„ 





-E141 7 
fc=i 



< 



n > n f 



(5) 



uniformly in </>. Under (H2) and (H3), we have that ^[n" 1 £™ =1 %(0! 0°)] > ° for 110 ~ 0°ll > 
n^ 1 / 2 and furthermore Varln -1 £™ =1 7 7i(0; )] — O. Then there exists e > such that, with 
probability 1: 



lim inf inf n 



X E 



Vi(<f>;<f>°) > e. 



(6) 



On the other hand: inf|i _ (/) o|| >n -i/2 ra x £" =1 »7i(0; 0°) > ml ||<A-<A°||>n- 1 /2 n x £™ =1 ^(0; 0° 



n A„sup||0_^oj[> n -i/ 2 
relations ([5]) and ©■ 



ELiW-ELik* 



The Lemma follows taking into account the 



Proof of Lemma [3] (i) We show first that the assertion is true for the LS estimator: <j) n +n = 
argmhij, ££l 1 " 2 (^~ X^</>) 2 . Taking into account the assumptions (H2) and (H3), we have for all 
4> € r, X^r=n+i 7 7«(0! 0a) = Op{n v ). By way of contradiction, suppose that for the LASSO-type 



estimator: ||0 ni+ „ 2 -0x11 > n\ 



-V2„ 



. Then, since m > n u , we have £™=i»7j(0 ni+ „ 2 ;0?) > 



< 



nin x % " > n" +l5 . Thus, taking into account that n% < ri": inf<^ X)"=i %(0j 0i) + E™=n™+i ^(0) 02) 
JP (n"+ 5 )+0 J p(^) = P (n"+ 5 ). On the other hand: hn> [^=1 *7i(0! 0i) + E££w %(05 0°) 
E™=nr+i( e i — ^i(0? _ 0")) 2 = Op{ri"). Contradiction, between the two last results. Then 

Il0„ 1+ 1 n 2 -0?ll<^ ( "^ 5)/2 - 

Evidently E?=^i ^ ; („ 1 ,„ I+ „ 2 )(0; 0°) = O^K)+°M = P (rf). We suppose that: ||& 1+na - 

0?|| > n~ 1/2 n^~ . Then, by LemmaOU £,"=1 ^(o.nO^niW 0°) > Oj 3 (max(A n .( 0l „ 1 ),ra 1 u )) 
with a strictly positive probability. Thus 

< 1+ „ 2 (0) > P (n")+Oj 3 (max(A n:(0 ,„ l) ,n 1 ^ 1 )) = O^ (maxf^, A„ ;(0 , ni) )) (7) 



But, on the other hand, < +n2 (0) < EK^^+^faM) = <^M = P {rf) with 
probability 1, what is contradictory with ([7]). 

(ii) Let us denote: Z n (<j>) = E^i»h(^0i). U0) = E^ilfc - ^(0 - 0^)) 2 - (e 4 - 
XJ(</>" — </>2)) 2 ]i 0ni i s tne LS estimator of <£° calculated for i = 1, ••■ ,n%. For £„, we use 
the inequality \a 2 — b 2 \ < (a — 6) 2 , assumption (H2), claim (i), condition © and we obtain: 
M0« 1+ n 2 )l < Ojp(n _( "^ t '~' 5) n ,; ) = Ojpfn^"" 2 "^) = M 1 )- We apply this result in the 
following: = Z n {4>l) = i n (0?) > Z n (j> ni+ri2 ) + *„(0n 1+ n 2 ) > inf*Z n (0) - \o F (l)\. Thus 
\Zn(4> ni J rn2 )\ < | inf</> Z n (<p)\ + ojp(l). Under assumptions (H2) and (H3) we have: 
inf* Z n (0) = (V^ll0 m ~ 0ill) 2 ^r 1 £?=i IIX.II 2 - 2VnI(0 ni - 0i>7 1/2 ££1 e.X, 
= Ojp(1)0(1) - Or (l)o^(l) - Ojp(1) and |£ n (0„ 1+ „ 2 )| = 0*(1). 



> 
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Let us denote t s n (cj)) = t„{4») + A„. (ni ,„ 1+ „ 2) Ek=i(l0 



l< fc l 7 ) 



/ filYn-L+na.kl 



»5, fc p: 



Z° n {<t>) = Z n (0) + A n;(o , ni) [ELid^r - l^i,*r)]- We can write < +n2 (0) = Z*(0) +*«(<£) ■ 

E^i^f ~ (e< ~ XK0? - 0°)) 2 ] + A„ ;(ril ,„ 1+ „ 2) £Li(l< fc l 7 " I^Ll 7 )- Then ^ i+ „ 2 = 
argmin^(Z£(0)+t£(0)) = aigmin j4£ i+Tl3 (0). But 

rii+n 2 

l*n(0rai+n 2 )l - 2_^ II ( t ) n 1 +n 2 ~ $1 II X^X^ + An;(m , ni +n 2 

i=ni+l fc=l 

and by claim (i), under condition (J2J: 

= P {n-^-^n v ) + 0{r?' 2 )0 P {U s ni+n2 0?||) = P (n-^^) = o F (l). 

Besides 2T*(0°) = i*(0?) = 0, thus: > 1x^(^(0) + **(0)) = Z'(& x+r J + 1« (& l+Ba ) 
^(0^ 1+ n 2 ) - K(l)l > inf ^(0) - |or(l)|. Then 



|^(0 ni+n2 )|< | inf ^(0)|+ O ^(1) 



On the other hand: w£$ Z %(<!>) < mfj, Z n (<f>) + A„ ;(1 ni) inf^, E*Ui (|0, 



l< fe l 7 



inf 



ELi (l 



*?,l 7 



(8) 
But 

-1/2' 



< ELi (l^,fcl 7 - l< fc l 7 ) = -oAWK-^iW) = -Op (V 1 

Since inf^ Z n (<p) = O p (1), we get: inf^ Z^(4>) < — Ojp(l). Replacing this last relation in (0) we 
obtain: \Z^Q,' ni+na )\ = 0,(1). 



Recall that W(j lj2 ) = |0r 7l 72 -J 9 , where 0(_- .■ \ is the LS estimator of calculated to the 
samples j x + 1, • • • ,j 2 - 

Proof of Lemma [5] Since the distribution of e is absolutely continuous, then P[4>{j 1 ,j 2 ),k — 
0] = 0, for every k e {1, • • • ,p}. Consider, in this case by convention ^ n .(j 1 ,j 2 ) ( f'(j 1 ,j 2 ),k = (see 
Potscher and Schneider, 2009). So only the case 4>(j 1 j 2 ),k ^ is considered. 
Let us denote X(j lj2 ) a {J2—ji) x P sub- matrix of X corresponding of the samples i = j'i+1, ••• ,j2, 
f ■ ■ ^ denotes its fcth column and ~V(j lt j 2 ) = (Xi)ji<i<j 2 - 



X fc 



For j u J2 € {0,1, 



i}, let be the set: .4 



«;0'i 



{k e {!,••• ,p};< 



0'i,i 2 ),fe 



^ 0} with the 



index of nonzero components of the adaptive LASSO estimator of calculated to the samples 
jx + 1, • • • , J2- In order to prove the Lemma, we consider two possible cases for an index: it 
belongs or not to this set. 
If k £ A* n , ■ f •,, then by the Karush-Kuhn- Tucker (KKT) conditions (supposing sgn(4> s ,* ■ \ k ) ~ 

+), we have: 2- 1 X n>Uuh) w (juh)}k = X^ th) (Y {jl th) - % j2 )0(J lj2 )) = X^ Ja) (e Ul da) - 

X 'Uuh)&lh,h) *°)) = P {n 112 ) P (n-^)0(n) = P (n^). 

If k 4. A* i ■ ■ \, we have that n 1 ' 2 ^),^ ,■ > converges in law, for n — > oo, to some centered normal 



distribution. Then A T 



A -»,(Ji,j 2 )" 8/2 



Proof of Lemma ® Obviously |^ ljJ2) (0;0°) - ^(0; 0°)| < ^^lELi^Wx,*.),*^.*! 
|0° fe |]|. Using Lemma [5] we obtain: 



sup 

0<ji<J2<n 



32 






< C^(n Q ) + O^(n^) = Op(n-r). 
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Proof of Lemma [7] From Lemma[51 n 1 A n | EaUi ^(o,n), &[</>,*; ~~ 0° fc ] | < e/2, Vn > n e . The rest 
of proof is similar to that of Lemma [2] ■ 

Proof of Lemma ® (i) If ||C 1+ „ 2 - <t>°\\ > n^n^, then we have: E^i C(o, ni )(^ ^S) ^ 

O^ (max(V,^)) = P (maxK+V^)) andES?i<(„ 1 ,„ 1+ „ a )(0;^) = P (n v )+ 

Ojp(n^) = Ojp^^+Ojp^"^) = P (n v ). These imply that ^* 1+ „ 2 (0) > OiP (maxK+V 1 * 1 * 1 ) 
with a strictly positive probability. On the other hand, taking into account Lemma[5j A s ^* +n (</>) < 

ESi <( ni>ni+na )(0i; 02) = 0^(n 2 ) + (^^(n^) = 0^(n 2 ) = P {n% Contradiction. 

(ii) We set: t s n *((j>) = i„(0) + A n;(ril , ni+ „ 2) ELl *W 1+ n 2 ),fc(l<M - l< fc l), ^T (</>) = ^„(0) + 

^n;(o,ni)2fe=i ^(o,ni),fe(|^,fc| - |<A?,fc|)) wi th Z n ((f>), t n (4>) as in the proof of Lemma [1 Then 

|02 fe I). The rest of proof is similar to that of Lemma [31 I 



6.2 Proofs of theorems 

Proof of Theorem [TJ The proof is split into three steps, using the same technique as in the 
paper Ciuperca (2011a): 

Step 1. We prove that, under (H1)-(H3), with probability approaching 1, the change-point esti- 
mators are to a smaller distance than [n 1 ' 2 ] from the true values. More precisely, for p £ (a, 1) 
with a > 1/2 as in the Lemma [Tj we have: 

Vr = l,---,K P[\% - £| > [n"}] -> 0, n->oo (9) 

For this, let us study what happens if we assume that there exists l®. such that \lt — 1®\ > [n p ], 
for each t = 0, 1, • • • , K + 1. In this case, between l® — [n p ] and I® + [n p ] there is not any point 
h, ■ ■ ■ , l K - Then, let be the set: £(p) = {(l u ■■■ , l K );Q < l x < ■ ■ ■ < l K < n, Y^Li \lr-l° r \ < K]}, 
with p £ (a, 1) and let us consider (l\, ■ ■ ■ ,1k) £ £r(p)> wrtn ^r(p) — {(h, ■ ■ ■ Jk)', \h ~ l®\ > 
[ n P],Vt = 1, • • ■ , K}. Thus, for all 7 > 0, we have: 

K+2 

S(h,--- ,lK)>S(h,--- JkJ !,--- ,l a r-i,l a r-[n p },l a r + [n p },l° r+1 ,--- ,l° K )= E L '< ( 10 ) 

t—1 

L t will be defined later. On the other hand S(lf , ■ ■ ■ , l K ) < So with probability one. Recall that 
&0 = YTi=\ £ \ + J2 r =i \i;(i°_ 1 ;i°)Yl P j=i l^rjl 7 an d tne definition of 5 is given by (TJ). For all 
t € {1, • • • ,r — l,r + 1, • • • ,K + 2} let us consider the points k\ yt < ••• < kj(t),t — {h, • • • Jk}<^ 
{j; l°_ 1 < j < Z°} and define: 

,7(t) + l / kj, t p 

L t = J2 n i in E & ~ x 'i(^ - <^)) 2 + v,(*,_ llt ^,o E i<^r 

j=i j y*=fc i _ llt +i fe=i 

Hence, due to the fact the \n-Xk-! t .fc t ) — °{kj,t ~ %-i,t) and using Lemma[TJ 



> -2(if+l)sup 1 <, <J <„ inf J2Li+iVt(ij)((p;<p 



(11) 
-0^(n Q ), 
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with <p of relation (ITT1) one of the true parameters 4>®,r = 1, ••• ,K + 1. For the samples between 
1° - [n p ] and Z,° + [n p ] we have: 



T — r'r+l n J c-2 _ V^J^j + 1 , ^P IJ.0 17 _ Y^fJ \ Y^P UO |7 



^°+[™ p ] ,.2 _ Y^ J ( r )+ 1 \ v^P I j.0 |7 _ \^ J ( r ) 

^i=lO-[ n p] + l t i l^j = l / V(fcj-l,r-,fcj,r) Z^fc=l IW,fcl Z-fj=l 

lm <p (Z). i L;o-[„p] + i 7 / J S ;(/o_[„p] i /o)(0;0 r ) + EiL;o +1 7 7 i S ;(; ,/o + [„p])(^'; 0r+l)| ■ 

(12) 
Since to left and to right of each change-point the models are different, we suppose \\4> — cpP\\ > c. 

But EiL/o_[„p ]+1 7?j(0;^r) = Op([n p ]) > 0(n 1/2 ). Since A„ ;(/ o_ [ „ P]/ o ) = 0(n 1 / 2 ) and taking 

into account the relation between r\i and rjf, we obtain that: ([12"]) is of order Op([n'']) > 0. Then, 

for HO}, using flTTl) and the last relation, we obtain: S(Ji, ■■■ ,Ik)> -0 P (n a ) + P ([n p ]) + S . 

This last relation implies: 

-Plmin^... i ; K )g£c( p ) S(h, • • • , ?k) > -So] — > 1 for n — > oo, and relation §§§ follows. 

Step 2. We prove now that the change-point estimators are at a smaller distance than ro 1 / 4 of 

true values: for all v < 1/4 we have: lP[\lf, — l®\ > n v ] — > 0, for n — > oo, for every r = 1, • • ■ , if. 
Therefore the step 1, the change-point estimators belong in the set £(/?), with probability tending 
to 1 for n — > oo. For a r <E {1, • • • , K} let be the subset of C(p): 

L c r {y) = {(l u ■ ■ ■ , l K ) € C( P ); \l t - l° r \ > n v ,t = 1, • • • , K}. 

For (l!,..- ,fc) e ££(»/), we have that: S(li,--- ,fc) > S(h,--- ,Zrt, *?,••• Jr-i^r ~ [n%l° + 
[n"],lr+ii " " j ^k)- F° r * 7^ r — 1; r ; by assumption (HI), using the step 1, we have that there are 
at most two points It and lt+i between q and Z° +1 . Suppose that there are two points It and lt+\ 
between I® and ^ +1 . If there is a single point or no point the approach is the same. 

D(ll Z° +1 ) = inf* {Et/ ?+ ife - X',(0 - <p a t+ i)) 2 + A n:(;? , /t) ELi I'M 7 } 

+ inf0 fe=t+ife - X '^ - <th+i)f + *»;<W« + i) ELi I'M 7 } 
+ inf0 [Elt+ife - X iW> - ^+i)) 2 + A n;(**+^ +1 ) ELi I'M 7 } ■ 
Consequently, S(h, ■ ■ ■ JkJi, ■ • • Jr-i^r ~ [ nI/ Mr + [ n "Mr+i! '" Jk) ~ $o can be written as: 

Et*-i,r { D ( l l iU) - Efi« + i £2 - A »;W.'? +1 ) ELl l#, fc p} 

+ (d{iu, i°r - [n v ]) - E l ttl ] + i ^ - ^W-^-KD ELi l<J 7 ) 

+ ^(/ r ° + K], l° r+1 ) e!3o +[ „. ]+ i e? - a„ ;(/ o +KM o +i) ELi l^ + i, fe l 7 ) 

+ (^ft° - [n"],J? + [»»]) - ElLli- !JU ]+1 e? - An-.c/o-Ki^o) ELi l< fe l 7 -A„ ;(W M) ELi l^+i, fe l' 
= A + S + C + L», with D^.!,!" - K]), D(«° + [n^],;° +1 ), D{P r - [n u },l° + [n v ]) sums with 
the same form that D(l°, l® +1 ). 

For A : since \l® — l%\ < Op([n p )), with p < 3/4 and under assumption (HI), taking into account 
Lemma [21^ ii), we have: 

h+i v ( 't+i "1 

D{iii° t+1 )- J2 £2 - A - ; a?A 1 )Ei^r=¥i E < (;t ,/ t+l) (^^+i)Mi+ojp(i)) = o J p(i). 

l= ;?+i fe=i [i=/ t +i J 

Similarly B, C = P (1). 

ForD : D(l° r - [n%l° r + [n v ]) is equal to inf {eI/^^.j^ - X^ - 0°)) 2 
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+K;(l°-[n%l°) Efc=l l^fcl 7 + EiLto+l( £ i - X i(0 - ^r+l)) 2 + K;(l°,l°+ln»]) Efc=l I'M 7 }' Since 

0, r 7^ <P r +i> ^ <A i s the minimizer of the last relation, there exists a constant c > such that at least 
one of two ||0 — r || or ||0 — r+1 || is greater than c. Let us suppose that it is the first one. Then, 
for ||0-0 r || > c, using Lemma 1 of Babu(1989) we have: EjL/o_ [n >,] ??i ; (io_[„*yo)W>; r ) = ||0- 

^ii 2 e!o_ K] iix 1 ii 2 -2(0-^)'e!Lo_ K] ^x 1 +a„ ;(/ o_ KM o ) (eLi i^r -ELi ie*i 7 ) = 

OjpO") + o{n") = P {n v ), uniformly in 0. So L> = F {n v ) and then 

inf [S(h r -- J K Jl--- ,l° r _ 1 ,l° r -[ri v },l r + [n v },l r + i,--- ,l K )-S ]>O F ([n 1 ']), (13) 

with probability tending to one for n — > oo. This involves: 

inf (;i ,... )(K)e£ o (!/) S , (/i,--- ,Ik,%,-~ ,lr-iA-[n v ],l r + [n v ],l r+l ,--- ,l° K ) > S and the step 2 

is proved. 

Step 3. Now, we can show the theorem: Z P — I® = Op(l) for any r — 1, ■ • • , K. 

Let be the set: C{v) = {{h,--- ,l K )]\h-l°\ < [n"],V* = I,--- ,K} with v < 1/4. For a Mi >0 

to determine later, let also the set: C r (v,Mi) = {(h, ■ • • Ik) G C{v);l r — 1° < —Mi}. 

Consider two vectors of change-points (mi,--- , m^) € C{v) and (h,--- ,1k) € £ r (i/, jMi) 

such that m f — It for t ^ r and m r = Z°. Using the notations specified in Section 3, we can 

write: S(l u ■ ■ ■ , l K ) - S(mi, ■ ■ • , m K ) = {E^ r _ 1+1 [(^ - Yl_ lMd ? - (Yj - Y$ r _ ul o )d ) 2 ] + 

Kidr-rM ELi l^ P _ llIp) ,*l' r - ^.iCr-x,!?) ELi l^^l 7 } + {E^+iK^ - ^U +l)>J -) 2 - 

An;(j°,; P+1 ) ELi l^o,; r+1 ),fcl 7 } - {hi + J i2} + {^21} + {/31 + h2}- By Lemma Efti) we have for 
hi- hi = E'=; r _ 1+ i Vj(&[i r _ u i r );$r) ~ Vj(<i>ll r - U l°yi<l>r) = P (1). By Lemmaii), taking 
ni = ? r — l r _i > [n u ], u > 3/4, n>i = Ij — Z r , since A n; (j r _ 1 j r ) = 0((Z r — ^r-1) 1 ^ 2 ) it holds that: 



i' 



h2 = An;(J r _i,J r )E l^(/r-i,/r),fe| 7 ~ 5Z l^r-i ,/?),* H + [ A «; (/,.- 1 ,/,.) ~ A n;(i,- 1 ,1°)] ^2 l^(/.-i,/o),fe| 7 

fe=l fe=l fe=l 

= A n; (/ r _ 1 ,i r )O F (||0(j p _ lj j p ) - 0(j r _ 1) jO)||) + Ojp(An;(j r _ i; j r ) - A n; (j r _ 1 ; ;o)) = Ojp(1) + Ojp(7° - l r ) 

= op(l® — i r ). Similarly, it can be shown that /31 = Op{l) and -Z32 = op(l® — l r )- For J21' 
hi= £ [( £j -X^ +1 -<^)) 2 -e 2 ] + £ [fe-XU0(^ +l) -0?)) 2 

j = Z P + l J = ir + 1 



-( £l -X;.(^ +1 -0°)) 2 ]- f; [(ei-Xj^^o)-^)) 2 -^ 



For J2, Lemma [DJi) combined with the step 2, condition ©, yield that 



= J1 + J2 + J 3 . 



^2 = [||0(J P1 / r+1 ) -^"ll 2 - Il0r+1 -^rll 2 ] X! " Xi H 2 ~ 2 (^(/r,Jr+l) ~ 0r+l)J X! £ ' iXi 

i=! P + l i=! P + l 

= Oj.(n- (u - , '-^(l° - I P )) + Opin'^ 1 ^ - IJ) = Op(n-^^) = o P (l). 
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Similarly | J 3 | = Op(l). For J\, since <pP ^ 4>° r +ii there exists C\ > such that |Ji| > Ci(7° — l r ). 
We choose Mi > such that |Ji| (then J 2 i also) is bigger than max(/i 2 ,/ii,/3i, 132)- Then 
linin^co JP[{1{, • • ■ , Zfj-) £ £,(^, .Mi)] = and theorem is established. ■ 

Proof of Theorem [2] Since E[Zy] > cj, then for all 6 > there exists M 2 $ > such that: 

iP[| argmin jeZ ZJ r ^| < M 2 j] > 1 — 6. By Theorem!]] for all <5 > 0, there exists A^ j > such 

that for each r = 1, • • • , If: P[\l s r — £"| < A^ g]> 1 — 6. Consider then M = max{7W 1 g,M 2 g}. 
For each \i r \ < M 7 let us consider: 
S(l° 1 +io,---,l°K + iK)-S(l 1 ,---J K ) 



K 



E E 



Yj-Xtf 



3 < P{l° 7 .. 1 +ir-ul<i.+i r ) 



Yj -X^-0 ( ;C Li) 



A' *"+« 



E E {[(^- x ^(^ 1+ ^ 1 ,^)) 2 -^. 



r=l j=;«+i 



(^■- x ^(;«.(« +1 )) 2 -4 



a 

E 



P 



a„,(;o _ 1 +i r _ 1 ,i°+i r ) 2^ l^^.+j^i.ij+i.o.fc 17 



I 7 - ^n,(ig ., ,1°) /, \4> S (l°_ 



k=l 



k=l 



and by Corollary [U = oj>(1) + £* =1 2f'(l + °Wl)) + °pW- So 5(/? + i , ■ ■ ■ ,l° K 



'S'('i) - " )'a) converges jointly in i r in distribution to X)r=i ^j f° r n 
independent set of random variables around each change-point, the theorem follows. 



co. Since we have 



Proof of Theorem \3\ We combine Theorems 2 and 3 of Knight and Fu (2000) with Corollary 
[I] and Theorem [2] of this paper. ■ 

Proof of Theorem [4] Consider first K < Kq. By Lemma [TJ we have for a £ (1/2; 3/4): 



> S(«Uo'- • • > 1 'ko,k ) ~ S ° = P (n a )- So § Ko = n^SUl 



AV 



l S K ,K )~ri-iSo + n-iSo = 



Opin 



a— 1\ 



•IV" e 2 



'(1) 



E[a 2 ]. On the other hand, since the distance between 



two consecutive change-points is at least n 3 ' 4 , we prove in the same way as for relation (|13[) . 



that: S(l s 1K ,- 



^k,k) 



So > Cn 3 /\ Then: n(s K - % )s^ 



'A',, 



(S(k,K,--- Jk,k)-So)-(S(11 Ko ,---J s Ko , Ko )-S )\ >0 P (n^)-0 P (n*) = F (n^). 
Due to fact B n < n 3 / 4 we have for K < K , B(K) - B(K ) -A 00. 

n— >oo 

Let us consider now that K > i^o- Thus: So > S(lf K , ••• , JJf #• ) > S(lf K ,--- ,l 8 K K ) > 



^('l,A: ' 



i l K,K> 



'?> ' ' ' 1 'a ) — "So ~~ Op{n v ), with z/ < 1/4. The last inequality is obtained by 



similar calculations of the Theorem [T] proof, step 2. Then < §k — sk — Op{n v ~ l ), which 
implies immediately that: nlogSK — nlogSK = Op{n v ). Taking into account that the function 
G is increasing in K and that n a -C B n -C n 3 ' 4 , we have for the criterion B(K) — B(Kq) = 
-OjpK) + G(K,p)B n - G(K ,p )B n > P (n a ). Then, for n -> 00, F[^„ > iT ] -> 0. ■ 



Proof of Theorem [6] Observe that the claim (i) follows immediately by Theorem 2 of Zou 

(2006) and Theorem [3] 

(ii) Recall first the definition of the two sets: A% ;0 \ — {k £ {!>•'■ i-P};^rfe ^ ^} and 
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A* .- - , = {k G {l,--- ,P};kt, ?„, . 7^ 0}. By Theorem 2 of Zou (2006) the adaptive 
LASSO estimator in a model without change-points has the oracle properties: linin^cc 1P[A* ,, , -. 
•A*t,a ,o\] — 1- It remains to prove that: linin-s.oo 1P[A* „-„, f- „. = A*,, m ] = 1. The asymp- 

vV— 1' rJ n :VV_l:V / vV— 1' r/ 

totic normality of estimators implies that: dlu — <b 8 X. - , — > 0, Vfc G A* ,~ , - ,. Then 
lim rwoo JP[A* n ~„ r „. C^* ;0 iJ?) ] = 1. We now prove: I 3 [3 fc G {1, • • • ,rf; & g -A^o _ i-! o ) and fc G 
A* ~ - .] — >• 0, for n — > oo. Using the notations given in the proof of Lemma [SJ by KKT op- 
timally conditions: *\#. L J v .)<btf VL j„ )th = X tf°*_J°*)( Y (l" r U%*) ~ X '(f» 1 ,t'»)^'rti,U- t )^ Con " 
sequence of (i): (Z** — Ir-iY^tfu, ^ s ,^ = Oj=»(l). Taking into account conditions imposed to 

A n,(i';- 1 ,ir) w fci,y),t = Vfe-^V) ^ _ p* w 2 i ^ 

(fr-fci) 1/2 " (r-fci) 1/2 "~ l(fc-fci) 1/a % iif „ ))fc l» ™°°' 

On the other hand, in the case (i*-ii^r*) — Cr-ij'r)? we have 



(/«* - ?r-i) 1/2 #* - « 



r-1 

X fe 1 ,i» g ('"?-i- f ") 
(/>-fci) 1/2 

Using the claim (i), the assumptions (H2), (H3) we obtain that the last term converges to the 
sum of two normal distributions. Then, for n — > oo: lP[3k G {1, • • • ,p}; k G" A*, l0 , n and k G 

In the case (^Ii JD 2 Gr-i^")> suppose, without loss of generality, that l^*_ x < 1®_ X < I** < 1° 



(other cases are similar). So, we have the decomposition: XJ 5 . ~ (Y,p» f«*) — ^v«* t,*^^"* ,? s *)) 



X (V-i.^i)^ Y ^-i>^i) _X feii^^^^ 



A„ + D n . As previously, D n /(lp" — Z^-i) 1 / 2 converge to the sum of two normal distributions. For 

Stfw jo )(£* - l s r *-i) 1/2 = A in + A 2n . But A 2n = ojp(l). Combining (H2) and Theorem[S]we 

have: A ln = '- 1 '- 1 ' "'^"l „-.. ,-.. A /3 = O p (1)o p (1) = o P {l). Then 

^4n = Oip(l). Consequently, there exists a constant M > such that lim n _j. oc iP[|A„ + D n \ < 
M] = 0. Hence JPlBk G {1, ■ ■ • ,»}; k <£ A* n0 ,„, and k e A* , f „ ,- ,1 -)■ 0, for n^co. 
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